Port 14.5 - 14.6 changes range#1621
Conversation
0829842 to
09e574f
Compare
fmgr_sql must make expanded-datum arguments read-only, because it's possible that the function body will pass the argument to more than one callee function. If one of those functions takes the datum's R/W property as license to scribble on it, then later callees will see an unexpected value, leading to wrong answers. From a performance standpoint, it'd be nice to skip this in the common case that the argument value is passed to only one callee. However, detecting that seems fairly hard, and certainly not something that I care to attempt in a back-patched bug fix. Per report from Adam Mackler. This has been broken since we invented expanded datums, so back-patch to all supported branches. Discussion: https://postgr.es/m/WScDU5qfoZ7PB2gXwNqwGGgDPmWzz08VdydcPFLhOwUKZcdWbblbo-0Lku-qhuEiZoXJ82jpiQU4hOjOcrevYEDeoAvz6nR0IU4IHhXnaCA=@mackler.email Discussion: https://postgr.es/m/187436.1660143060@sss.pgh.pa.us
Previously, we relied on HEAP2_NEW_CID records and XACT_INVALIDATION records to know if the transaction has modified the catalog, and that information is not serialized to snapshot. Therefore, after the restart, if the logical decoding decodes only the commit record of the transaction that has actually modified a catalog, we will miss adding its XID to the snapshot. Thus, we will end up looking at catalogs with the wrong snapshot. To fix this problem, this changes the snapshot builder so that it remembers the last-running-xacts list of the decoded RUNNING_XACTS record after restoring the previously serialized snapshot. Then, we mark the transaction as containing catalog changes if it's in the list of initial running transactions and its commit record has XACT_XINFO_HAS_INVALS. To avoid ABI breakage, we store the array of the initial running transactions in the static variables InitialRunningXacts and NInitialRunningXacts, instead of storing those in SnapBuild or ReorderBuffer. This approach has a false positive; we could end up adding the transaction that didn't change catalog to the snapshot since we cannot distinguish whether the transaction has catalog changes only by checking the COMMIT record. It doesn't have the information on which (sub) transaction has catalog changes, and XACT_XINFO_HAS_INVALS doesn't necessarily indicate that the transaction has catalog change. But that won't be a problem since we use snapshot built during decoding only to read system catalogs. On the master branch, we took a more future-proof approach by writing catalog modifying transactions to the serialized snapshot which avoids the above false positive. But we cannot backpatch it because of a change in the SnapBuild. Reported-by: Mike Oh Author: Masahiko Sawada Reviewed-by: Amit Kapila, Shi yu, Takamichi Osumi, Kyotaro Horiguchi, Bertrand Drouvot, Ahsan Hadi Backpatch-through: 10 Discussion: https://postgr.es/m/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2%40amazon.com
This was originally done in commit 0c20dd3 for 16 only, to eliminate duplicate code and as an infrastructure that makes it easier to write future tests. However, it has been suggested that it would be good to back-patch this testing infrastructure to aid future tests in back-branches. Backpatch to all supported versions. Author: Masahiko Sawada Reviewed by: Amit Kapila, Shi yu Discussion: https://postgr.es/m/CAD21AoC-fvAkaKHa4t1urupwL8xbAcWRePeETvshvy80f6WV1A@mail.gmail.com Discussion: https://postgr.es/m/E1oJBIf-0006sw-SA@gemulon.postgresql.org
Member tracking was added in PG 13. Reported-by: David G. Johnston Discussion: https://postgr.es/m/CAKFQuwY1YtxQHVWUFYvSnOjZ5VPpXjF33V52bSKEwFjK2K=1Aw@mail.gmail.com Author: David G. Johnston Backpatch-through: 13
Reported-by: David G. Johnston Discussion: https://postgr.es/m/CAKFQuwZ24UcfkoyLLSW3PMGQATomOcw1nuYFRuMev-NoOF+mYw@mail.gmail.com Author: David G. Johnston Backpatch-through: 14, partial to 13
Mention that the table is not modified if it already exists. Reported-by: frank_limpert@yahoo.com Discussion: https://postgr.es/m/164441177106.9677.5991676148704507229@wrigleys.postgresql.org Backpatch-through: 10
Somehow this was in the syntax but had no description. Reported-by: robertcorrington@gmail.com Discussion: https://postgr.es/m/164228771825.31954.2719791849363756957@wrigleys.postgresql.org Backpatch-through: 10
The use of file 'config.pl' was not clearly explained. Reported-by: liambowen@gmail.com Discussion: https://postgr.es/m/164246013804.31952.4958087335645367498@wrigleys.postgresql.org Backpatch-through: 10
Reported-by: Simon Riggs Discussion: https://postgr.es/m/CANP8+jJESuuXYq9Djvf-+tx2vY2OFLmfEuu+UvwHNJ1RT7iJCQ@mail.gmail.com Author: Simon Riggs Backpatch-through: 10
Reported-by: Jonathan S. Katz Discussion: https://postgr.es/m/c59ffbd5-96ac-a5a5-a401-14f627ca1405@postgresql.org Backpatch-through: 11
Reported-by: Shinya Kato Discussion: https://postgr.es/m/1ecdb1ff78e9b03dfce37e85eaca725a@oss.nttdata.com Author: Shinya Kato Backpatch-through: 10
96ef3b8 accidentally copied a not applicable comment from the float8_pass_by_value code to the data_checksums code. Remove that. 87d3b35 changed pg_upgrade to checking the checksum version rather than just the Boolean presence of checksums, but didn't change the field type in its ControlData struct from bool. So this would not work correctly if there ever is a checksum version larger than 1.
Most parts of the parser can expect that the stack overflow check in transformExprRecurse() will trigger before things get desperate. However, transformFromClauseItem() can recurse directly to self without having analyzed any expressions, so it's possible to drive it to a stack-overrun crash. Add a check to prevent that. Per bug #17583 from Egor Chindyaskin. Back-patch to all supported branches. Richard Guo Discussion: https://postgr.es/m/17583-33be55b9f981f75c@postgresql.org
It's possible to reach this case when work_mem is very small and tupsize is (relatively) very large. In that case ExecChooseHashTableSize would get an assertion failure, or with asserts off it'd compute nbuckets = 0, which'd likely cause misbehavior later (I've not checked). To fix, clamp the number of buckets to be at least 1. This is due to faulty conversion of old my_log2() coding in 28d9360. Back-patch to v13, as that was. Zhang Mingli Discussion: https://postgr.es/m/beb64ca0-91e2-44ac-bf4a-7ea36275ec02@Spark
This option switch supports a total of 8 values, as told by set_plan_disabling_options() and the documentation, but this was not reflected in the output generated by --help. Author: Junwang Zhao Discussion: https://postgr.es/m/CAEG8a3+pT3cWzyjzKs184L1XMNm8NDnoJLiSjAYSO7XqpRh_vA@mail.gmail.com Backpatch-through: 10
There's a convention that externally-visible libpq functions should check for a NULL PGconn pointer, and fail gracefully instead of crashing. PQflush() and PQisnonblocking() didn't get that memo though. Also add a similar check to PQdefaultSSLKeyPassHook_OpenSSL; while it's not clear that ordinary usage could reach that with a null conn pointer, it's cheap enough to check, so let's be consistent. Daniele Varrazzo and Tom Lane Discussion: https://postgr.es/m/CA+mi_8Zm_mVVyW1iNFgyMd9Oh0Nv8-F+7Y3-BqwMgTMHuo_h2Q@mail.gmail.com
In ref/create_sequence.sgml <literal> tag was used for nextval function name. This should have been <function> tag. Author: Noboru Saito Discussion: https://postgr.es/m/CAAM3qnJTDFFfRf5JHJ4AYrNcqXgMmj0pbH0%2Bvm%3DYva%2BpJyGymA%40mail.gmail.com Backpatch-through: 10
The current publisher code checks if UPDATE or DELETE can be executed with the replica identity of the table even if it's a partitioned table. We can skip checking the replica identity for partitioned tables because the operations are actually performed on the leaf partitions (not the partitioned table). Reported-by: Brad Nicholson Author: Hou Zhijie Reviewed-by: Peter Smith, Amit Kapila Backpatch-through: 13 Discussion: https://postgr.es/m/CAMMnM%3D8i5DohH%3DYKzV0_wYuYSYvuOJoL9F5nzXTc%2ByzsG1f6rg%40mail.gmail.com
The tty connection string parameter was removed in commit 14d9b37 but the reference to it in the docs was mistakenly kept. Fix by removing it from the libpq documentation. Backpatch through v14 where the parameter was removed. Author: Noriyoshi Shinoda <noriyoshi.shinoda@hpe.com> Discussion: https://postgr.es/m/DM4PR84MB173433216FCC2A3961879000EE6B9@DM4PR84MB1734.NAMPRD84.PROD.OUTLOOK.COM Backpatch-through: 14
The assert, introduced by 9f1cf97, is intended to check if the prefix is terminated by a \0 byte, but it has two flaws. Firstly, prefix_size includes the \0 byte, so prefix[prefix_size] points to the byte after the null byte. Secondly, the check ensures the byte is not equal \0, while it should be checking the opposite. Backpatch-through: 14 Discussion: https://postgr.es/m/b99b6101-2f14-3796-3dfa-4a6cd7d4326d@enterprisedb.com
When creating a partitioned index, DefineIndex tries to identify any existing indexes on the partitions that match the partitioned index, so that it can absorb those as child indexes instead of building new ones. Part of the matching is to compare IndexInfo structs --- but that wasn't done quite right. We're comparing the IndexInfo built within DefineIndex itself to one made from existing catalog contents by BuildIndexInfo. Notably, while BuildIndexInfo will run index expressions and predicates through expression preprocessing, that has not happened to DefineIndex's struct. The result is failure to match and subsequent creation of duplicate indexes. The easiest and most bulletproof fix is to build a new IndexInfo using BuildIndexInfo, thereby guaranteeing that the processing done is identical. While here, let's also extract the opfamily and collation data from the new partitioned index, removing ad-hoc logic that duplicated knowledge about how those are constructed. Per report from Christophe Pettus. Back-patch to v11 where we invented partitioned indexes. Richard Guo and Tom Lane Discussion: https://postgr.es/m/8864BFAA-81FD-4BF9-8E06-7DEB8D4164ED@thebuild.com
While decoding changes in a loop, if we skip all the changes there is no CFI making the loop uninterruptible. Reported-by: Whale Song and Andrey Borodin Bug: 17580 Author: Masahiko Sawada Reviwed-by: Amit Kapila Backpatch-through: 10 Discussion: https://postgr.es/m/17580-849c1d5b6d7eb422@postgresql.org Discussion: https://postgr.es/m/B319ECD6-9A28-4CDF-A8F4-3591E0BF2369@yandex-team.ru
sysctl is more portable than Linux's /proc/sys file tree, and often easier to use too. That's why most of our docs refer to sysctl when talking about how to adjust kernel parameters. Bring the few stragglers into line. Discussion: https://postgr.es/m/361175.1661187463@sss.pgh.pa.us
On fast machines, it's possible for applications such as pgbench to issue connection requests so quickly that the postmaster's listen queue overflows in the kernel, resulting in unexpected failures (with not-very-helpful error messages). Most modern OSes allow the queue size to be increased, so document how to do that. Per report from Kevin McKibbin. Discussion: https://postgr.es/m/CADc_NKg2d+oZY9mg4DdQdoUcGzN2kOYXBu-3--RW_hEe0tUV=g@mail.gmail.com
SplitToVariants() in the ispell code, lseg_inside_poly() in geo_ops.c, and regex_selectivity_sub() in selectivity estimation could recurse until stack overflow; fix by adding check_stack_depth() calls. So could next() in the regex compiler, but that case is better fixed by converting its tail recursion to a loop. (We probably get better code that way too, since next() can now be inlined into its sole caller.) There remains a reachable stack overrun in the Turkish stemmer, but we'll need some advice from the Snowball people about how to fix that. Per report from Egor Chindyaskin and Alexander Lakhin. These mistakes are old, so back-patch to all supported branches. Richard Guo and Tom Lane Discussion: https://postgr.es/m/1661334672.728714027@f473.i.mail.ru
While waiting for slots to become available in wait_on_slots() in parallel_slot.c, the cancellation always relied on the first connection in the set to do the job. This could cause problems when this slot's socket is gone as PQgetCancel() would return NULL in this case. Rather than always using the first connection, this changes the logic to use the first valid connection for the cancellation. Author: Ranier Vilela Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/CAEudQAokk1h_pUwGXsYS4oVOuf35s1O2o3TXGHpV8=AWikvgHA@mail.gmail.com Backpatch-through: 14
Compute total number of sub-parts correctly, per jason@banfelder.net Simon Riggs Discussion: https://postgr.es/m/166161184718.1235920.6304070286124217754@wrigleys.postgresql.org
When a PostgreSQL instance performing archive recovery but not using standby mode is promoted, and the last WAL segment that it attempted to read ended in a partial record, the previous code would create invalid WAL on the new timeline. The WAL from the previously timeline would be copied to the new timeline up until the end of the last valid record, but instead of beginning to write WAL at immediately afterwards, the promoted server would write an overwrite contrecord at the beginning of the next segment. The end of the previous segment would be left as all-zeroes, resulting in failures if anything tried to read WAL from that file. The root of the issue is that ReadRecord() decides whether to set abortedRecPtr and missingContrecPtr based on the value of StandbyMode, but ReadRecord() switches to a new timeline based on the value of ArchiveRecoveryRequested. We shouldn't try to write an overwrite contrecord if we're switching to a new timeline, so change the test in ReadRecod() to check ArchiveRecoveryRequested instead. Code fix by Dilip Kumar. Comments by me incorporating suggested language from Álvaro Herrera. Further review from Kyotaro Horiguchi and Sami Imseih. Discussion: http://postgr.es/m/CAFiTN-t7umki=PK8dT1tcPV=mOUe2vNhHML6b3T7W7qqvvajjg@mail.gmail.com Discussion: http://postgr.es/m/FB0DEA0B-E14E-43A0-811F-C1AE93D00FF3%40amazon.com
The default of lazy symbol resolution means that when the postmaster first reaches the select() call in ServerLoop, it'll need to resolve the link to that libc entry point. NetBSD's dynamic loader takes an internal lock while doing that, and if a signal interrupts the operation then there is a risk of self-deadlock should the signal handler do anything that requires that lock, as several of the postmaster signal handlers do. The window for this is pretty narrow, and timing considerations make it unlikely that a signal would arrive right then anyway. But it's semi-repeatable on slow single-CPU machines, and in principle the race could happen with any hardware. The least messy solution to this is to force binding of dynamic symbols at postmaster start, using the "-z now" linker option. While we're at it, also use "-z relro" so as to provide a small security gain. It's not entirely clear whether any other platforms share this issue, but for now we'll assume it's NetBSD-specific. (We might later try to use "-z now" on more platforms for performance reasons, but that would not likely be something to back-patch.) Report and patch by me; the idea to fix it this way is from Andres Freund. Discussion: https://postgr.es/m/3384826.1661802235@sss.pgh.pa.us
|
There is 95 commits already... I don't think we need more to this PR. |
|
It’s weird that the tests failed, but there is no regression.diff files. |
398b6fb to
904f1cf
Compare
There are regression.diffs, one can download them from CI links, but show-regression-diffs step does not find them |
|
Hi @reshke, for the minor kernel upgrade, we can also update the version string |
The executor will dump core if it's asked to execute a seqscan on a relation having no table AM, such as a view. While that shouldn't really happen, it's possible to get there via catalog corruption, such as a missing ON SELECT rule. It seems worth installing a defense against that. There are multiple plausible places for such a defense, but I picked the planner's get_relation_info(). Per discussion of bug #17646 from Kui Liu. Back-patch to v12 where the tableam APIs were introduced; in older versions you won't get a SIGSEGV, so it seems less pressing. Discussion: https://postgr.es/m/17646-70c93cfa40365776@postgresql.org
DefineQueryRewrite() has long required that ON SELECT rules be named "_RETURN". But we overlooked the converse case: we should forbid non-ON-SELECT rules that are named "_RETURN". In particular this prevents using CREATE OR REPLACE RULE to overwrite a view's _RETURN rule with some other kind of rule, thereby breaking the view. Per bug #17646 from Kui Liu. Back-patch to all supported branches. Discussion: https://postgr.es/m/17646-70c93cfa40365776@postgresql.org
The fix is to run ANALYZE. Discussion: https://postgr.es/m/YzRr+ys98UzVQJvK@momjian.us, https://postgr.es/m/flat/CAKJS1f8DTbCHf9gedU0He6ARsd58E6qOhEHM1caomqj_r9MOiQ%40mail.gmail.com, https://postgr.es/m/CAKJS1f80o98hcfSk8j%3DfdN09S7Sjz%2BvuzhEwbyQqvHJb_sZw0g%40mail.gmail.com Backpatch-through: 10
It was previously easily overlooked at the end of several tables. Reported-by: Alex Denman Discussion: https://postgr.es/m/166335888474.659.16897487975376230364@wrigleys.postgresql.org Backpatch-through: 10
Per https://llvm.org/docs/OpaquePointers.html, support for non-opaque pointers still exists and we can request that on our context. We have until LLVM 16 to move to opaque pointers, a much larger change. Back-patch to 11, where LLVM support arrived. Author: Thomas Munro <thomas.munro@gmail.com> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAMHz58Sf_xncdyqsekoVsNeKcruKootLtVH6cYXVhhUR1oKPCg%40mail.gmail.com
…oding. When the logical decoding restarts from NEW_CID, since there is no association between the top transaction and its subtransaction, both are created as top transactions and have the same LSN. This caused the assertion failure in AssertTXNLsnOrder(). This patch skips the assertion check until we reach the LSN at which we start decoding the contents of the transaction, specifically start_decoding_at LSN in SnapBuild. This is okay because we don't guarantee to make the association between top transaction and subtransaction until we try to decode the actual contents of transaction. The ordering of the records prior to the start_decoding_at LSN should have been checked before the restart. The other assertion failure is due to the reason that we forgot to track that we have considered top-level transaction id in the list of catalog changing transactions that were committed when one of its subtransactions is marked as containing catalog change. Reported-by: Tomas Vondra, Osumi Takamichi Author: Masahiko Sawada, Kuroda Hayato Reviewed-by: Amit Kapila, Dilip Kumar, Kuroda Hayato, Kyotaro Horiguchi, Masahiko Sawada Backpatch-through: 10 Discussion: https://postgr.es/m/a89b46b6-0239-2fd5-71a9-b19b1f7a7145%40enterprisedb.com Discussion: https://postgr.es/m/TYCPR01MB83733C6CEAE47D0280814D5AED7A9%40TYCPR01MB8373.jpnprd01.prod.outlook.com
56788d2 adjusted the parallel seq scan code so that instead of handing out a single block at a time to parallel workers, it now hands out ranges of blocks. Here we update the documentation which still claimed that workers received just 1 block at a time. Reported-by: Zhang Mingli Discussion: https://postgr.es/m/17c99615-2c3b-4e4e-9d0b-424a66a7bccd@Spark Backpatch-through: 14, where 56788d2 was added.
…ng decoding. This problem has been introduced by commit 272248a0c1 where we started assigning the subtransactions to the top-level transaction when we mark both the top-level transaction and its subtransactions as containing catalog changes. After we assign subtransactions to the top-level transaction, we were not allowed to execute any invalidations associated with it when we decide to skip the transaction. The reason to assign the subtransactions to the top-level transaction was to avoid the assertion failure in AssertTXNLsnOrder() as they have the same LSN when we sometimes start accumulating transaction changes for partial transactions after the restart. Now that with commit 64ff0fe4e8, we skip this assertion check until we reach the LSN at which we start decoding the contents of the transaction, so, there is no reason for such an assignment anymore. The assignment change was introduced in 15 and prior versions but this bug doesn't exist in branches prior to 14 since we don't add invalidation messages to subtransactions. We decided to backpatch through 11 for consistency but not for 10 since its final release is near. Reported-by: Kuroda Hayato Author: Masahiko Sawada Reviewed-by: Amit Kapila Backpatch-through: 11 Discussion: https://postgr.es/m/TYAPR01MB58660803BCAA7849C8584AA4F57E9%40TYAPR01MB5866.jpnprd01.prod.outlook.com Discussion: https://postgr.es/m/a89b46b6-0239-2fd5-71a9-b19b1f7a7145%40enterprisedb.com
Previously in commit 42681dffaf, we added CFI during decoding changes but missed another similar case that can happen while restoring changes spilled to disk back into memory in a loop. Reported-by: Robert Haas Author: Amit Kapila Backpatch-through: 10 Discussion: https://postgr.es/m/CA+TgmoaLObg0QbstbC8ykDwOdD1bDkr4AbPpB=0DPgA2JW0mFg@mail.gmail.com
904f1cf to
e95d8b9
Compare
|
Had excluded postgres/postgres@8122160ffb6 and marked it "low-priority" in excel. Will reconsider later |
well, ABI checker doesnt like it: |
|
We can consider reverting 14.4 - 14.5 version bump... or ignore ABI check |
Good catch! For this PR, we can revert the minor version bump. Let’s see how to solve it further later. |
b9c8585 to
e95d8b9
Compare
Next minor changes after #1574